NetNews Offline 2

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Offline 2 / NetNews Offline Volume 2.iso / news / comp / lang / c-part1 / 2442 < prev next >

Wrap

Text File | 1996-08-05 | 4.0 KB | 116 lines

Path: keats.ugrad.cs.ubc.ca!not-for-mail From: c2a192@ugrad.cs.ubc.ca (Kazimir Kylheku) Newsgroups: comp.lang.c Subject: Re: I/O Question Date: 21 Jan 1996 07:45:11 -0800 Organization: Computer Science, University of B.C., Vancouver, B.C., Canada Message-ID: <4dtn27INNn43@keats.ugrad.cs.ubc.ca> References: <4dp1nm$mu6@taco.cc.ncsu.edu> NNTP-Posting-Host: keats.ugrad.cs.ubc.ca In article <4dp1nm$mu6@taco.cc.ncsu.edu>, Alex David Groce <adgroce> wrote: >I'm relatively inexperienced in C, was started with C++ here at NC State. >Anyway, working on a project where speed really matters, I'm reading >from a file containing 100001 integers in text format. I was originally >using C++'s iostreams, but discovered the using stdio.h speeded things up >by an incredible factor. I guess the C++ routines are so general that >they're slllooooooooww as molasses. Anyway, I'm using scanf now, as in > >while (scanf("%i", &list[count++]) != EOF); // Read elements. >count--; // Avoid good old OBOB, the Off-By-One-Bug. > >Would it be worth my time to learn buffered I/O techniques, read or >getchar or such, to input this data? Every little bit of time >counts... Or is there a good example somewhere of a fast way to do >this? Try this! %{ /* Lex lexical analyzer */ #include <stdio.h> #define MAXSIZE 16384 int number; %} ws [\n\t ] digit [0-9] integer "-"?{digit}+ %% {ws} { /* skip */ } {integer} { number = atoi(yytext); return 1; } . { /* skip illegal char */ } %% main() { int list[MAXSIZE]; int count = 0; /* open a file, assign it to FILE *yyin */ /* call yylex() to get numbers one by one */ /* reads standard input by default */ while(yylex() && count < MAXSIZE) { /* 'number' contains number that was read from yyin */ list[count++] = number; } } Pump the above through the "lex" (or better, the GNU "flex"), and the compile the generated program "lex.yy.c" and link with the -ll (or -lfl if you used flex) library. The above will make a scanner as fast as the one done with scanf(). I built the program with flex(), and it turned out to be just a bit slower than scanf(), but with flex's -f option to build a fast scanner at the expense of memory, it works almost twice as fast. My test was an 8MB file containing the same number repeated over and over again, created by just doing a "yes 12345678 > file" and letting it run for a few seconds. Here, a.out is the scanf() version, and b.out is the flex -f generated version: turing:~/src/test$ time a.out < testfile 9.26user 0.76system 0:12.43elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (20major+44minor)pagefaults 0swaps turing:~/src/test$ time b.out < testfile 6.70user 0.54system 0:09.58elapsed 75%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (22major+39minor)pagefaults 0swaps turing:~/src/test$ uname -a Linux turing 1.3.42 #22 Sat Jan 13 12:22:51 PST 1996 i586 I did one "dry run" before that just to make sure that one program wasn't getting an unfair advantage due to caching. The regular expression patterns that define the lexical analyzer are compiled into a static table by lex, whereas scanf() has to parse its format string each time you call it, which is wasted overhead if the format string doesn't change from invocation to invocation. I have written many scanners using lex, so I can write dinky things like the above off the top of my head. There are better ways to use lex, though. You can make it more efficient by being clever with how you define the pattern matches. In my experience, a lex-generated scanner is typically at least ten times faster than programs that use scanf() as soon as your scanning starts to become non-trivial. Hand-written code based on the standard IO functions becomes hard to debug and maintain compared to a lex spec (though the latter has its own unique challenges). It's very good for doing things like: writing parsers for configuration files, importing databases from text files, scanning logs, writing lexical analyzers for compilers, text filters and the like. One thing is for sure, though: lex-generated code is bigger than a library call! --